-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Provide _is_utf8_charset()
in compat.php
for early use.
#7052
Conversation
When `is_utf8_charset()` was introduced, the `mb_strlen()` and `mb_substr()` compat functions were modified to call it, but they are defined in `compat.php` before `is_utf8_charset()` is defined in `functions.php`. Certain code calling these compat functions early in the boot process before `functions.php` is included and on hosts without the multi-byte extension would thus crash. In this patch the `is_utf8_charset()` function is split into pure and stateful components. The pure version is recreated as `_is_utf8_charset()` and defined in `compat.php` while the existing function (which defaults to calling `get_option( 'blog_charset' )`) is left in place in `functions.php`. This ensures that code calling it will be able to call a form of the function even in early sequences. Follow-up to [58169]. Props dmsnell, donncha, hellofromTonya, jeherve, slyall, spacedmonkey. Fixes #61680.
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN:
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks like it will fix the issue. I've been unsuccessful reproducing the original issue so far, so it's difficult for me to confirm the fix.
@@ -40,6 +40,40 @@ function _wp_can_use_pcre_u( $set = null ) { | |||
return $utf8_pcre; | |||
} | |||
|
|||
/** | |||
* Indicates if a given slug for a character set represents the UTF-8 text encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other functions in this file mention that they're for internal use, like this:
Internal compat function to mimic mb_substr().
Should we mention that this is an internal function only intended to prevent load-order issues and that is_utf8_charset
should be used in most cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but also I don't see a reason why this shouldn't be made available 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_utf8_charset
is the API we expect folks to use, not this new function. To me, that's a good reason this should include @ignore
and be noted as an internal compact function
I also think we need a note for why this is here since it's not a compat function like everything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this and done code review. I'm approving on those grounds.
I haven't been able to reproduce, so I haven't confirmed the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it should fix the issue as reported in Automattic/jetpack#38358. However, a similar problem can happen any time a compat function in this file is called by code prior to when functions.php
is loaded. For example, if _mb_substr()
is called with $encoding
set to null
then a similar undefined function error would be triggered when get_option()
is called, since the option.php
file is loaded at the top of functions.php
.
Maybe a better strategy for these polyfills is to always do a function_exists()
check before calling any WordPress functions that might be loaded after these polyfills are available and providing a graceful fallback when the function is not yet loaded.
Thanks @joemcgill - I also noted the apparent call to My initial reaction was that perhaps we could create a new CI task to check if anything in In any case the situations where these things run is rare enough. I would imagine that we could get away with the runtime checks and have it not matter much. IMO, good work to discuss in an enhancement. |
Totally agree that handling that separately makes sense.
This is the tricky part. While I appreciate not calling |
Thanks @joemcgill
Would you care to share more about this? From my perspective, fixing this makes the code better by introducing this split, particularly since The only complication I see is that it's in I considered duplicating the logic of this function here directly, but given that it was necessary in both of the Do you see this as a blocker to the patch as proposed? Without trying to derail this conversation, I'm only more convinced each week that a good future direction for Core is UTF-8 everywhere, and to that end, some UTF-8 functionality is missing, and there have been many problems lately related to whether the multibyte extension is available. I've been exploring a native UTF-8 handling system in #6883 and think it has merit. Similar to how the HTML API has replaced in user-space some PHP-provided functionality, incorporating our own UTF-8 encoding and decoding would let us finally get rid of a lot of needless and inconsistent checks and conversions. I've been exploring some of this code inside Automattic too. When I feel it's ready for actual discussion I'll start posting on Make about it. |
Sure! Besides the use of an option value, I don't see that maintainability concern as a blocker. However, the other problem that I just noticed when looking at this again, is that now any time the |
I think there are two points (in addition to the UTF-8 everwhere one):
I think we should focus on 2 right now and create a separate core ticket and one where we can at a minimum include some documentation at the top of the file and ideally include some automated tests as well. |
@@ -40,6 +40,40 @@ function _wp_can_use_pcre_u( $set = null ) { | |||
return $utf8_pcre; | |||
} | |||
|
|||
/** | |||
* Indicates if a given slug for a character set represents the UTF-8 text encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_utf8_charset
is the API we expect folks to use, not this new function. To me, that's a good reason this should include @ignore
and be noted as an internal compact function
I also think we need a note for why this is here since it's not a compat function like everything else
@@ -7503,20 +7503,7 @@ function get_tag_regex( $tag ) { | |||
* @return bool Whether the slug represents the UTF-8 encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should include an @see _is_utf8_charset
since this is now essentially a wrapper function
Follow up ticket: https://core.trac.wordpress.org/ticket/61694 |
Trac ticket: Core-61680
When
is_utf8_charset()
was introduced, themb_strlen()
andmb_substr()
compat functions were modified to call it, but they are defined incompat.php
beforeis_utf8_charset()
is defined infunctions.php
.Certain code calling these compat functions early in the boot process before
functions.php
is included and on hosts without the multi-byte extension would thus crash.In this patch the
is_utf8_charset()
function is split into pure and stateful components. The pure version is recreated as_is_utf8_charset()
and defined incompat.php
while the existing function (which defaults to callingget_option( 'blog_charset' )
) is left in place infunctions.php
. This ensures that code calling it will be able to call a form of the function even in early sequences.Follow-up to [58169].
Props dmsnell, donncha, hellofromTonya, jeherve, slyall, spacedmonkey.
Fixes #61680.